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I.  Goals 


The  effort  has  three  major  components. 

(1)  Gain  alg(»ithm  experience  in  conversion  of  a  suite  of  Air  Force  production  CFD  codes 
to  a  general  format  applicable  to  a  variety  of  such  commercial  architectures. 

(2)  Examine  the  feasibility  of  using  workstation  networks  for  such  distributed  computation; 

this  involves  (a)  developing  timing  models  of  the  communication  systems  of  such  networks,  (b) 
projecting  performance  of  the  above  codes  on  such  networks,  and  (c)  implementing  one  or  more 
codes,  as  time  permits.  _ 

(3)  Initiate  research  on  CFD-based  low-radar  crossecdon  analysis  on  parallel  systems;  this 
effort  is  in  association  with  Dr.  Joseph  Shang  at  WRDC. 


II.  Progress  Report 


1989-1990  Progress 

The  following  were  grant-sponsored  accomplishments. 

( 1 )  Implicit  algorithm  development  A  full  3-D  Navier-Stokes  Beam- Warming  CFD  code 
was  implemented  on  a  1024-node  scalar  NCUBE  hypercube  at  SANDIA  (Albuquerque). 

(2)  Distributed-workstation  architectures  for  CFD.  The  University  is  providing  a  cluster 
of  IBM  workstations  for  distributed  algorithm  study.  Critical  timing  features  of  such  a  system  are 
being  measured  to  insert  into  the  overall  timing  models  associated  with  the  completed  explicit  and 
implicit  AFFDL  Navier  Stokes  codes;  their  parallel  performance  will  then  be  predicted 

(3)  Generic  distributed  parallel  codes.  Commercial  operating  systems  are  available 
which  permit  algorithm  coding  toward  a  distributed  parallel  environment  that  includes  most  current 
MIMD  systems.  The  EXPfeSS  system  from  Parasoft  has  been  adopted  and  the  completed 
explicit  and  implicit  AFFDL  Navier  Steves  codes  are  being  adapted  to  this  software  environment 

(4)  Crossection  analysis.  Recently-proposed  CFD-related  numerical  procedures  by  Dr. 
Shang  on  new  methods  of  solving  MaxweU's  equations  in  real  time  are  being  examined  for  their 
solvability  on  parallel  systems.  This  effort  will  begin  in  earnest  in  summer  1991. 


1920-1991  PEpgress 

The  following  were  grant-sponsmed  efforts. 

(1)  Distributed-workstation  architectures  for  CFD.  At  the  time  of  this  grant 
initiation,  the  only  computing  resources  in  Dr.  Shang's  group  with  potentially  scalable,  pai^lel 
features  were  a  collection  of  graphic  workstations.  We  decided,  after  completion  of  the  above 
implicit  code  conversion,  to  examine  the  feasibility  of  using  workstation  networks  for  such 
distributed  computation;  this  involves  (a)  dewloping  tuning  noodels  of  the  communication  systems 
of  such  netwo^,  (b)  projecting  performance  of  the  stove  codes  on  such  netwmks,  and  (c) 
in^lementing  one  or  mme  codes,  as  time  permits. 

The  University  had  provided  a  cluster  of  IBM  workstations  for  distributed  algorithm  study. 
Unfortunately,  it  was  found  that,  as  message-passing  architectures,  the  models  provided  had  long 
latency  times  even  when  connected  in  a  local  network  and,  in  the  general  network  environment 
provided  at  the  University,  these  latencies  would  wither  any  but  an  embarrassingly  parallel 


algorithm.  It  was  clear  that  further  research  with  locally-available  network  technology  would  be 
have  no  value  in  demonstrating  to  Dr.  Shang  the  usefulness  of  networking  his  workstations.  Also, 
an  effort  by  the  University  to  implement  the  EXPRESS  distributed  system  on  these  IBM  units  was 
unsuccessful.  For  these  reasons,  this  research  effon  was  abandoned. 

(21  CFD-based  low-radar  crossection  analysis  on  parallel  systems.  Recently-proposed 
CFD-related  numerical  procedures  by  Dr.  Shang  on  new  methods  of  solving  Maxwell's  equations 
in  real  time  were  examined  for  their  solvability  on  parallel  systems.  Dr.  Shang  forwarded  a  2-D 
code  for  study.  It  was  found  that  the  algorithm  kernel  involved  a  forward-substitution  process, 
which,  with  little  computational  complexity,  was  deemed  inappropriate  for  message-passing 
architectures.  Some  investigation  was  made  of  the  CM-2  because  of  its  NEWS  high  speed 
interconnect.  However,  it  was  later  felt  by  Dr.  Shang  that  the  sample  algorithm  was  not  extendable 
to  general  3-D  problems,and  work  was  suspended  awaiting  a  new  sample  serial  code. 

(3)  In  June,  Dr.  Shang  informed  us  that  he  had  received  a  start-up  effort  to  exploit  the 
DELTA  in  his  work,  and  that  he  wished  our  assistance  in  the  work.  Most  students  familiar  with 
these  codes  had  left  our  project,  so  it  was  decided  that 

(a)  his  staff  would,  beginning  with  the  above-mentioned  NCUBE  explicit  code,  carry  out  a 
conversion  to  the  DELTA  of  a  newer  explicit  code; 

(b)  we  would  evaluate  the  feasibility  of  converting  his  implicit  code  from  the  NCUBE^  to 
the  DELTA. 

Regarding  the  latter,  it  was  understood  that  the  syntactic  conversion  would  be  trivial.  However, 
the  NCUBE  algorithm  was  converted  from  the  serial  form  specifically  to  minimize  the  number  of 
hops  in  a  hypercube  interconnect.  The  price  paid  was  a  significant  increase  in  the  number  of 
messages  in  the  NCUBE  version.  The  relatively  low  message  latency  in  the  NCUBE  hardware 
had  resulted  in  a  60%  parallelization  efficiency.  It  was  obvious  that  the  DELTA  would  be 
relatively  tixire  affected  by  latency,  and  an  initial  evaluation  has  led  us  to  look  elsewhere  for 
efficient  implicit  kernels  which  could  be  interfaced  with  the  FDL  code.  We  visited  the  Parallel 
Systems  Division  at  NASA/ARC  for  discussions  with  a  researcher  engaged  in  similar  activities  on 
the  INl^  GAMh^.  By  October  15,  the  end  of  this  reporting  period,  we  had  not  obtained  access 
to  the  DELTA,  but  we  had  poformed  some  preUminary  generic  NCUBE-INTEL  conversion  on  the 
Argonne  GAMMA. 

We  obtained  access  to  the  DELTA  in  late  November  and  attempted  to  implement  our  simpler 
NCUBE  explicit  code  on  the  DELTA.  We  have  encountered  a  number  of  system  problems,  as 
well  as  I/O  programming  issues  due  to  the  large  available  local  metiwries  on  the  DELTA,  in 
contrast  to  die  NCUBE. 

In  summary,  we  foresee  a  number  ctf  programming  and  algorithmic  issues  to  achieve  a  state-of-the- 
art  implementation  of  the  FDL  implicit  code  on  the  DELTA,  and  we  are  now  evaluating  which 
would  be  ^propriate  for  our  grant  to  study  or  import  from  ARC. 


^This  was  the  version  of  the  NCUBE  with  5I2K  bytes/node;  it  was  not  the  NCUBE2. 


The  following  were  grant-sponsored  efforts. 


(1)  CED-based  Computational  Electromaenetcs  (CEM)  on  parallel  systems. 
Recently-proposed  CFD-related  numerical  procedures  by  Dr.  Shang  on  new  methods  of  solving 
Maxwell's  equations  in  real  time  were  examined  for  their  solvability  on  parallel  systems.  In 
previous  years  of  grant  effort,  Dr.  Shang  forwarded  2-D  and  3-D  explicit  CEM  codes  for  study. 
These  were  found  by  Dr.  Shang  to  have  numerical  problems  and  were  put  aside  before 
parallelization. 

In  the  summer  of  1992,  Dr.  Shang  forwarded  a  new  suite  of  three  CEM  codes  for  parallelization. 
An  attempt  to  port  these  to  a  recently-purchased  KSR  at  the  University  was  put  aside  when  it 
became  clear  that  the  level  of  KSR  compiler  support  would  not  permit  efHcient  parallelization.  It 
was  agreed  with  Dr.  Shang  that  remaining  effort^  should  be  spent  on  the  DELTA,  which  had 
achieved  a  reasonable  level  of  hardware  stability  and  compiler  efficiency.  Experience  on  the  KSR 
was  useful  in  giving  insight,  however.  In  the  process  of  preparing  a  code  to  exploit  the  KSR's 
automatic  parallelization  ("tiling"),  a  version  of  the  code  was  developed  which  could  be  readily 
converted  to  a  message-passing  machine  like  the  DELTA. 

As  a  result,  a  two-step  algorithm-  and  code-development  procedure  was  developed.  In  step  (1) 
Professor  Calahan  carry  out  most  parallelization  on  a  reliable  uniprocessw  mainframe  with  fainiliar 
and  sophisticated  debugging  tools;  the  appropriate  DELTA  message-passing  libraries  were 
emulated  where  necessary.  In  step  (2),  this  code  was  convened  to  the  DELTA,  principally  a 
syntactic  step,  involving  Dr.  Shang's  CEM  staff  at  WRDC.  When  the  grant  terminates,  these 
application  researchers  will  then  be  able  to  carry  on  independently.  Student  assistants  at  the 
University  are  also  involved  in  this  final  parallelization  step.  It  is  expected  that  these  three  codes 
will  be  completely  parallized  by  3/31/93  K  A  paper  abstract  on  this  topic,  joint  with  WRDC,  has 
been  submitted  [2]. 

(2)  Distributed  ChD  implicit  code.  Based  on  experience  with  the  above-mentionedtwo- 
step  process,  it  was  felt  reasonable  to  re-institute  a  project  to  parallelize  a  prototype  implicit  CFD 
code  for  the  DELTA;  a  previous  parallelization  for  the  NCUBE  [1]  was  deemed  inappropriate  due 
to  the  relatively  long  message  startup  of  the  DELTA.  This  project  had  languished  due  to  inability 
of  finding  a  st^ent  sufficiently  expmenced  to  carry  out  the  somewhat  involved  parallelization.  It 
is  now  felt  that  the  above  two-step  parallelization  process  involving  Professor  Calahan  in  the 
emulation  step  will  make  parallelization  possible  with  modest  student  and  WRDC  help  in  the  final 
parallelization  step.  Again,  WRDC  involvement  will  have  an  imptntant  educational  value. 

We  now  have  in  hand  the  most  recent  implicit  N-S  production  code  from  WRDC.  Successful 
parallelization  will  permit  DELTA  or  PARAGON  solution  within  the  3-year  period  of  a  DARPA 
contract  with  the  WRDC  CFD  group. 


1992-1994  progress 


In  joint  work  with  Dr.  Shang  at  WRDC,  the  the  Fall  of  1992  two  serial  CEM  codes  were 
restructured  in  generic  serial  formats  suitable  for  easy  implementation  on  distributed  parallel 
architectures;  also,  perftnrmance  projections  were  made  based  on  knowledge  of  the  DELTA 


^The  grant  was  scheduled  to  terminate  on  10/14/92.  A  1-year  no-cost  extension  has 
been  approved. 


architecture.  A  total  of  six  generic  programs  were  developed,  depending  on  the  number  geometric 
directions  to  be  partitioned  (i.e.,  there  were  1-D,  2-D,  and  3-D  versions  of  each  code)  t  was  later 
decided  that  the  numerical  characteristics  of  one  code  (the  implicit)  were  nor  suitable,  so  effon  was 
continued  only  on  one  code.  The  1-D  and  2-D  versions  of  this  explicit  code  were  then  parallelized 
on  the  DELTA,  and  performance  data  reported  in  [2].  Another  CEM  code  was  then  received  from 
Dr.  Shang  in  mid-summer  and  its  parallelization  reported  in  [3]. 


III.  Coupling  Activities 


1989-1990 

Air  Force  Flight  Dynamics  Laboratory 

The  implicit  code  parallelized  by  Kominsky  (above)  was  a  production  CFD  obtained  from  Dr. 
Joseph  Shang,  director  of  the  Computational  Aerodynamics  Group  at  AFFDL.  One  visit  and 
monthly  contacts  were  made  to  his  laboratory.  This  completed  a  study  initiated  in  a  previous 
AFOSR  grant  to  develop  distributed  parallel  versions  of  principal  production  CFD  codes  in 
Shang's  group.  ,  , 


laaoiim 

Air  Force  Flight  Dynamics  Uboratorv 

The  implicit  code  parallelized  by  Kominsky  (above)  was  a  production  CFD  obtained  from  Dr. 
Joseph  Shang,  director  of  the  Computational  Aerodynamics  Group  at  AFFDL.  Monthly  contacts 
were  made  to  his  laboratory  in  regard  to  conversion  of  the  NCUBE  explicit  code  to  the  DELTA. 


12aiil222 

Air  FOTCg-EUghiDynamigs  Laboratory 

Bi-monthly  visits  are  made  to  WRDC  to  discuss  the  above-memtioned  CEM  and  CFD  codes. 


1892-1994 

Air  Force  Flight  Ehmamics  Laboratory 

A  number  of  visits  were  made  to  WRDC  to  discuss  parallelization  of  CEM  codes. 
PhiUips  Laboratory.  Kirtland  AFB 


A  visit  was  made  to  determine  the  extent  to  which  the  interests  and  experience  of  the  PI  might  relate 
to  their  research  in  parallel  conq)utation. 
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